Binary jumbled string matching for highly run-length compressible texts

نویسندگان

  • Golnaz Badkobeh
  • Gabriele Fici
  • Steve Kroon
  • Zsuzsanna Lipták
چکیده

The Binary Jumbled String Matching problem is defined as: Given a string s over {a, b} of length n and a query (x, y), with x, y non-negative integers, decide whether s has a substring t with exactly x a’s and y b’s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time O(n2/ log n) [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or O(n2/ log n) in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of s. The construction time of our index is O(n+ ρ2 log ρ), where O(n) is the time for computing the run-length encoding of s and ρ is the length of this encoding—this is no worse than previous solutions if ρ = O(n/ log n) and better if ρ = o(n/ log n). Our index L can be queried in O(log ρ) time. While |L| = O(min(n, ρ2)) in the worst case, preliminary investigations have indicated that |L| may often be close to ρ. Furthermore, the algorithm for constructing the index is conceptually simple and easy to implement. In an attempt to shed light on the structure and size of our index, we characterize it in terms of the prefix normal forms of s introduced in [Fici and Lipták, DLT 2011].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New algorithms for binary jumbled pattern matching

Given a pattern P and a text T , both strings over a binary alphabet, the binary jumbled string matching problem consists in telling whether any permutation of P occurs in T . The indexed version of this problem, i.e., preprocessing a string to efficiently answer such permutation queries, is hard and has been studied in the last few years. Currently the best bounds for this problem are O(n/ log...

متن کامل

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms tha...

متن کامل

Fast and Simple Jumbled Indexing for Binary RLE Strings

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms tha...

متن کامل

Grammar-Based Construction of Indexes for Binary Jumbled Pattern Matching

We show how, given a straight-line program with g rules for a binary string B of length n, in O ( gn ) time we can build a (2nH0(B)+ o(n))-bit index such that, given m and c, in O(1) time we can determine whether there is a substring of B with length m containing exactly c copies of 1. If we use O(n log n) bits for the index, then we can list all such substrings using O(m) time per substring.

متن کامل

Binary Jumbled Pattern Matching via All-Pairs Shortest Paths

In binary jumbled pattern matching we wish to preprocess a binary string S in order to answer queries (i, j) which ask for a substring of S that is of size i and has exactly j 1-bits. The problem naturally generalizes to node-labeled trees and graphs by replacing “substring” with “connected subgraph”. In this paper, we give an n/2 n/ log log n) 1/2 time solution for both strings and trees. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 113  شماره 

صفحات  -

تاریخ انتشار 2013